61 research outputs found
Building Efficient Query Engines in a High-Level Language
In this paper we advocate that it is time for a radical rethinking of database systems design. Developers should be able to leverage high-level programming languages without having to pay a price in efficiency. To realize our vision of abstraction without regret, we present LegoBase, a query engine written in the high-level programming language Scala. The key technique to regain efficiency is to apply generative programming: the Scala code that constitutes the query engine, despite its high-level appearance, is actually a program generator that emits specialized, low-level C code. We show how the combination of high-level and generative programming allows to easily implement a wide spectrum of optimizations that are difficult to achieve with existing low-level query compilers, and how it can continuously optimize the query engine. We evaluate our approach with the TPC-H benchmark and show that: (a) with all optimizations enabled, our architecture significantly outperforms a commercial in-memory database system as well as an existing query compiler, (b) these performance improvements require programming just a few hundred lines of high-level code instead of complicated low-level code that is required by existing query compilers and, finally, that (c) the compilation overhead is low compared to the overall execution time, thus making our approach usable in practice for efficiently compiling query engines
Properties of Healthcare Teaming Networks as a Function of Network Construction Algorithms
Network models of healthcare systems can be used to examine how providers
collaborate, communicate, refer patients to each other. Most healthcare service
network models have been constructed from patient claims data, using billing
claims to link patients with providers. The data sets can be quite large,
making standard methods for network construction computationally challenging
and thus requiring the use of alternate construction algorithms. While these
alternate methods have seen increasing use in generating healthcare networks,
there is little to no literature comparing the differences in the structural
properties of the generated networks. To address this issue, we compared the
properties of healthcare networks constructed using different algorithms and
the 2013 Medicare Part B outpatient claims data. Three different algorithms
were compared: binning, sliding frame, and trace-route. Unipartite networks
linking either providers or healthcare organizations by shared patients were
built using each method. We found that each algorithm produced networks with
substantially different topological properties. Provider networks adhered to a
power law, and organization networks to a power law with exponential cutoff.
Censoring networks to exclude edges with less than 11 shared patients, a common
de-identification practice for healthcare network data, markedly reduced edge
numbers and greatly altered measures of vertex prominence such as the
betweenness centrality. We identified patterns in the distance patients travel
between network providers, and most strikingly between providers in the
Northeast United States and Florida. We conclude that the choice of network
construction algorithm is critical for healthcare network analysis, and discuss
the implications for selecting the algorithm best suited to the type of
analysis to be performed.Comment: With links to comprehensive, high resolution figures and networks via
figshare.co
Building-Blocks for Performance Oriented DSLs
Domain-specific languages raise the level of abstraction in software
development. While it is evident that programmers can more easily reason about
very high-level programs, the same holds for compilers only if the compiler has
an accurate model of the application domain and the underlying target platform.
Since mapping high-level, general-purpose languages to modern, heterogeneous
hardware is becoming increasingly difficult, DSLs are an attractive way to
capitalize on improved hardware performance, precisely by making the compiler
reason on a higher level. Implementing efficient DSL compilers is a daunting
task however, and support for building performance-oriented DSLs is urgently
needed. To this end, we present the Delite Framework, an extensible toolkit
that drastically simplifies building embedded DSLs and compiling DSL programs
for execution on heterogeneous hardware. We discuss several building blocks in
some detail and present experimental results for the OptiML machine-learning
DSL implemented on top of Delite.Comment: In Proceedings DSL 2011, arXiv:1109.032
Spoofax at Oracle: Domain-Specific Language Engineering for Large-Scale Graph Analytics
For the last decade, teams at Oracle relied on the Spoofax language workbench to develop a family of domain-specific languages for graph analytics in research projects and in product development. In this paper, we analyze the requirements for integrating language processors into large-scale graph analytics toolkits and for the development of these language processors as part of a larger product development process. We discuss how Spoofax helps to meet these requirements and point out the need for future improvements
The LDBC Graphalytics Benchmark
In this document, we describe LDBC Graphalytics, an industrial-grade
benchmark for graph analysis platforms. The main goal of Graphalytics is to
enable the fair and objective comparison of graph analysis platforms. Due to
the diversity of bottlenecks and performance issues such platforms need to
address, Graphalytics consists of a set of selected deterministic algorithms
for full-graph analysis, standard graph datasets, synthetic dataset generators,
and reference output for validation purposes. Its test harness produces deep
metrics that quantify multiple kinds of systems scalability, weak and strong,
and robustness, such as failures and performance variability. The benchmark
also balances comprehensiveness with runtime necessary to obtain the deep
metrics. The benchmark comes with open-source software for generating
performance data, for validating algorithm results, for monitoring and sharing
performance data, and for obtaining the final benchmark result as a standard
performance report
LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms
ABSTRACT In this paper we introduce LDBC Graphalytics, a new industrial-grade benchmark for graph analysis platforms. It consists of six deterministic algorithms, standard datasets, synthetic dataset generators, and reference output, that enable the objective comparison of graph analysis platforms. Its test harness produces deep metrics that quantify multiple kinds of system scalability, such as horizontal/vertical and weak/strong, and of robustness, such as failures and performance variability. The benchmark comes with open-source software for generating data and monitoring performance. We describe and analyze six implementations of the benchmark (three from the community, three from the industry), providing insights into the strengths and weaknesses of the platforms. Key to our contribution, vendors perform the tuning and benchmarking of their platforms
- …